# Put any packages you want here

A Florida health insurance company wants to predict annual claims for individual clients. The company pulls a random sample of 50 customers. The owner wishes to charge an actuarially fair premium to ensure a normal rate of return. The owner collects all of their current customer’s health care expenses from the last year and compares them with what is known about each customer’s plan.

The data on the 50 customers in the sample is as follows:

  • Charges: Total medical expenses for a particular insurance plan (in dollars)
  • Age: Age of the primary beneficiary
  • BMI: Primary beneficiary’s body mass index (kg/m2)
  • Female: Primary beneficiary’s birth sex (0 = Male, 1 = Female)
  • Children: Number of children covered by health insurance plan (includes other dependents as well)
  • Smoker: Indicator if primary beneficiary is a smoker (0 = non-smoker, 1 = smoker)
  • Cities: Dummy variables for each city with the default being Sanford

Answer the following questions using complete sentences and attach all output, plots, etc. within this report.

For this assignment, ignore the categorical variables (gender, smoker, cities)

Question 1

Perform univariate analyses on the quantitative variables (center, shape, spread). Include descriptive statistics, and histograms. Be sure to use terms discussed in class such as bimodal, skewed left, etc.

Inspecting Data

str(Insurance)
## tibble [50 × 9] (S3: tbl_df/tbl/data.frame)
##  $ Charges      : num [1:50] 9145 7441 12143 3260 19023 ...
##  $ Age          : num [1:50] 52 45 60 31 39 25 25 57 34 42 ...
##  $ BMI          : num [1:50] 36.7 30.2 25.7 20.4 18.3 ...
##  $ Female       : num [1:50] 0 0 0 0 1 1 1 1 0 0 ...
##  $ Children     : num [1:50] 0 1 0 0 5 1 0 2 1 2 ...
##  $ Smoker       : num [1:50] 0 0 0 0 1 0 1 0 1 0 ...
##  $ WinterSprings: num [1:50] 0 0 0 0 0 0 0 0 0 0 ...
##  $ WinterPark   : num [1:50] 0 0 1 0 0 1 0 0 0 1 ...
##  $ Oviedo       : num [1:50] 1 1 0 1 1 0 1 1 0 0 ...
Insurance$Female <- NULL 
Insurance$WinterPark <- NULL
Insurance$WinterSprings <- NULL
Insurance$Oviedo <- NULL
Insurance$Smoker <- NULL

Insurance %>%
tbl_summary(statistic = list(all_continuous() ~ c("{mean} ({sd})",
"{median} ({p25}, {p75})",
"{min}, {max}"),
all_categorical() ~ "{n} / {N} ({p}%)"),
type = all_continuous() ~ "continuous2"
)
Characteristic N = 501
Charges
Mean (SD) 12,142 (11,317)
Median (IQR) 8,333 (4,360, 13,720)
Range 2,494, 55,135
Age
Mean (SD) 42 (13)
Median (IQR) 40 (30, 53)
Range 23, 64
BMI
Mean (SD) 28.7 (5.6)
Median (IQR) 28.0 (25.2, 32.2)
Range 16.8, 42.1
Children
0 17 / 50 (34%)
1 14 / 50 (28%)
2 12 / 50 (24%)
3 6 / 50 (12%)
5 1 / 50 (2.0%)
1 n / N (%)
plot_ly(x = Insurance$Age, type = "histogram", alpha = 0.6) %>% 
  layout(title = 'Distribution of Age',
         xaxis = list(title = 'Age of the primary beneficiary'),
         yaxis = list(title = 'Count'))
plot_ly(x = Insurance$Children, type = "histogram", alpha = 0.6) %>% 
  layout(title = 'Distribution of Children',
         xaxis = list(title = 'Number of children covered by health insurance plan (includes other dependents as well)'),
         yaxis = list(title = 'Count'))
plot_ly(x = Insurance$BMI, type = "histogram", alpha = 0.6) %>% 
  layout(title = 'Distribution of BMI',
         xaxis = list(title = 'Primary beneficiary’s body mass index (kg/m2)'),
         yaxis = list(title = 'Count'))

Jessica: This above is using data from star wars

Question 2

Perform bivariate analyses on the quantitative variables (direction, strength and form). Describe the linear association between all variables.

Question 3

Generate a regression equation in the following form:

\[Charges = \beta_{0}+\beta_{1}*Age+\beta_{2}*BMI+\beta_{3}*Children\]

also write out the regression cleanly in this document.

Question 4

An eager insurance representative comes back with a potential client. The client is 40, their BMI is 30, and they have one dependent. Using the regression equation above, predict the amount of medical expenses associated with this policy. (Provide a 95% confidence interval as well)